NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Can Large Language Models Verify System Software? A Case Study Using FSCQ as a Benchmark

https://doi.org/10.1145/3713082.3730382

Qin, Jianxing; Du, Alexander; Zhang, Danfeng; Lentz, Matthew; Zhuo, Danyang (May 2025, ACM)

Free, publicly-accessible full text available May 14, 2026
Rethinking RPC Communication for Microservices-based Applications

https://doi.org/10.1145/3713082.3730375

Zhu, Xiangfeng; Zhou, Yang; Wang, Yuyao; Gao, Xiangyu; Krishnamurthy, Arvind; Kumar, Sam; Mahajan, Ratul; Zhuo, Danyang (May 2025, ACM)

Free, publicly-accessible full text available May 14, 2026
High-level Programming for Application Networks

Zhu, Xiangfeng; Wang, Yuyao; Liu, Banruo; Wu, Yongtong; Bojanic, Nikola; Chen, Jingrong; Bernstein, Gilbert; Krishnamurthy, Arvind; Kumar, Sam; Mahajan, Ratul; et al (April 2025, USENIX)

Application networks facilitate communication between the microservices of cloud applications. They are built today using service meshes with low-level specifications that make it difficult to express application-specific functionality (e.g., access control based on RPC fields), and they can more than double the RPC latency. We develop AppNet, a framework that makes it easy to build expressive and high-performance application networks. Developers specify rich RPC processing in a high-level language with generalized match-action rules and built-in state management. We compile the specifications to high-performance code after optimizing where (e.g., client, server) and how (e.g., RPC library, proxy) each RPC processing element runs. The optimization uses symbolic abstraction and execution to judge if different runtime configurations of possibly-stateful RPC processing elements are semantically equivalent for arbitrary RPC streams. Our experiments show that AppNet can express common application network function in only 7-28 lines of code. Its optimizations lower RPC processing latency by up to 82%.
more » « less
Free, publicly-accessible full text available April 28, 2026
High-level Programming for Application Networks.

Zhu, Xiangfeng Zhu; Wang, Yuyao; Liu, Banruo; Wu, Yongtong Wu; Bojanic, Nikola; Chen, Jingrong; Bernstein, Gilbert L; Krishnamurthy, Arvind; Kumar, Sam; Mahajan, Ratul; et al (April 2025, USENIX NSDI)

Free, publicly-accessible full text available April 28, 2026
MCCS: A Service-based Approach to Collective Communication for Multi-Tenant Cloud

https://doi.org/10.1145/3651890.3672252

Wu, Yongji; Xu, Yechen; Chen, Jingrong; Wang, Zhaodong; Zhang, Ying; Lentz, Matthew; Zhuo, Danyang (August 2024, ACM)

Full Text Available
Fairness in Serving Large Language Models

Sheng, Ying; Cao, Shiyi; Li, Dacheng; Zhu, Banghua; Li, Zhuohan; Zhuo, Danyang; Gonzalez, Joseph E; Stoica, Ion (July 2024, 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24))

High-demand LLM inference services (e.g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading. To ensure that all client requests are processed fairly, most major LLM inference services have request rate limits, to ensure that no client can dominate the request queue. However, this rudimentary notion of fairness also results in under-utilization of the resources and poor client experience when there is spare capacity. While there is a rich literature on fair scheduling, serving LLMs presents new challenges due to their unpredictable request lengths and their unique batching characteristics on parallel accelerators. This paper introduces the definition of LLM serving fairness based on a cost function that accounts for the number of input and output tokens processed. To achieve fairness in serving, we propose a novel scheduling algorithm, the Virtual Token Counter (VTC), a fair scheduler based on the continuous batching mechanism. We prove a 2× tight upper bound on the service difference between two backlogged clients, adhering to the requirement of work-conserving. Through extensive experiments, we demonstrate the superior performance of VTC in ensuring fairness, especially in contrast to other baseline methods, which exhibit shortcomings under various conditions. The reproducible code is available at https://github.com/Ying1123/VTC-artifact.
more » « less
Full Text Available
Enoki: High Velocity Linux Kernel Scheduler Development

https://doi.org/10.1145/3627703.3629569

Miller, Samantha; Kumar, Anirudh; Vakharia, Tanay; Chen, Ang; Zhuo, Danyang; Anderson, Thomas (April 2024, ACM)

Kernel task scheduling is important for application performance, adaptability to new hardware, and complex user requirements. However, developing, testing, and debugging new scheduling algorithms in Linux, the most widely used cloud operating system, is slow and difficult. We developed Enoki, a framework for high velocity development of Linux kernel schedulers. Enoki schedulers are written in safe Rust, and the system supports live upgrade of new scheduling policies into the kernel, userspace debugging, and bidirectional communication with applications. A scheduler implemented with Enoki achieved near identical performance (within 1% on average) to the default Linux scheduler CFS on a wide range of benchmarks. Enoki is also able to support a range of research schedulers, specifically the Shinjuku scheduler, a locality aware scheduler, and the Arachne core arbiter, with good performance.
more » « less
Full Text Available
Harmonic: Hardware-assisted RDMA Performance Isolation for Public Clouds

Lou, Jiaqi; Kong, Xinhao; Huang, Jinghan; Bai, Wei; Kim, Nam Sung; Zhuo, Danyang (April 2024, USENIX NSDI)

Full Text Available
Application Defined Networks

https://doi.org/10.1145/3626111.3628178

Zhu, Xiangfeng; Deng, Weixin; Liu, Banruo; Chen, Jingrong; Wu, Yongji; Anderson, Thomas; Krishnamurthy, Arvind; Mahajan, Ratul; Zhuo, Danyang (November 2023, ACM)

HotNets'23.
more » « less
Remote Direct Memory Introspection

Liu, Hongyi; Xing, Jiarong; Huang, Yibo; Zhuo, Danyang; Devadas, Srinivas; Chen, Ang (August 2023, 32nd USENIX Security Symposium)

Hypervisors have played a critical role in cloud security, but they introduce a large trusted computing base (TCB) and incur a heavy performance tax. As of late, hypervisor offloading has become an emerging trend, where privileged functions are sunk into specially-designed hardware devices (e.g., Amazon’s Nitro, AMD’s Pensando) for better security with closer-to-baremetal performance. In light of this trend, this project rearchitects a classic security task that is often relegated to the hypervisor, memory introspection, while only using widely-available devices. Remote direct memory introspection (RDMI) couples two types of commodity programmable devices in a novel defense platform. It uses RDMA NICs for efficient memory access and programmable network devices for efficient computation, both operating at ASIC speeds. RDMI also provides a declarative language for users to articulate the introspection task, and its compiler automatically lowers the task to the hardware substrate for execution. Our evaluation shows that RDMI can protect baremetal machines without requiring a hypervisor, introspecting kernel state and detecting rootkits at high frequency and zero CPU overhead.
more » « less
Full Text Available

« Prev Next »

Search for: All records